{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EvD_fmQGKwm7"
      },
      "source": [
        "# 1️⃣ Training an Adapter for a Transformer Model\n",
        "\n",
        "In this notebook, we train an adapter for a **RoBERTa** ([Liu et al., 2019](https://arxiv.org/pdf/1907.11692.pdf)) model for sequence classification on a **sentiment analysis** task using the _[Adapters](https://github.com/Adapter-Hub/adapters)_ library and Hugging Face's _Transformers_ library.\n",
        "\n",
        "We train a **[bottleneck adapter](https://docs.adapterhub.ml/methods.html#bottleneck-adapters)** on top of a pre-trained model here. Most of the code is identical to a full fine-tuning setup using _Transformers_.\n",
        "\n",
        "For training, we use the [movie review dataset by Pang and Lee (2005)](http://www.cs.cornell.edu/people/pabo/movie-review-data/). It contains movie reviews from Rotten Tomatoes which are either classified as positive or negative. We download the dataset via Hugging Face's [_Datasets_](https://github.com/huggingface/datasets) library."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zcuSyuzjKzoW"
      },
      "source": [
        "## Installation\n",
        "\n",
        "First, let's install the required libraries:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "3P8cRBAXK1fW",
        "outputId": "f8ee713e-bb3b-4505-a28e-dec6cc18bbd1"
      },
      "outputs": [],
      "source": [
        "!pip install -qq adapters datasets"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DjiwZ8a0K7mJ"
      },
      "source": [
        "## Dataset Preprocessing\n",
        "\n",
        "Before we start to train our adapter, we first prepare the training data. Our training dataset can be loaded via HuggingFace `datasets` using one line of code:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "LCLS9RYeK83s",
        "outputId": "f95103b4-7cd5-47d4-81ad-ac60ad2188fe"
      },
      "outputs": [],
      "source": [
        "from datasets import load_dataset\n",
        "\n",
        "dataset = load_dataset(\"rotten_tomatoes\")\n",
        "dataset.num_rows"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8DnDtZHpK-_t"
      },
      "source": [
        "Every dataset sample has an input text and a binary label:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "huLjPAKHLA1g",
        "outputId": "cdeee448-f3b6-4fa1-9d93-1248c0ccddad"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'text': 'the rock is destined to be the 21st century\\'s new \" conan \" and that he\\'s going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .',\n",
              " 'label': 1}"
            ]
          },
          "execution_count": 2,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset['train'][0]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sYD2HAvSLFwA"
      },
      "source": [
        "Now, we need to encode all dataset samples to valid inputs for our Transformer model. Since we want to train on `roberta-base`, we load the corresponding `RobertaTokenizer`. Using `dataset.map()`, we can pass the full dataset through the tokenizer in batches:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 81,
          "referenced_widgets": [
            "870a8abde9324399aefa80d5cf857471",
            "f062d0cfdec94bdd82be0dbca56e77c6",
            "f85fe7b430494eb487a45ae6612d3b6d",
            "6e681b421f0f4f04af4b7d3b851f30b6",
            "ad15990f3b694fe3a79ff23f98b4aeae",
            "0747e9f532744df9bcdf8290052fd68c",
            "9cf59f42817445a3bcfec061fb6dfa94",
            "712beb84767048ac90a1267214bb6c14",
            "97788e654d604cf384bf1255a849cc95",
            "11abd146fa8f4aa4b570e0d20c0548d8",
            "621a9c99f2974b788b6b33fa2100ef20",
            "d6f0885eb18148edb987763d2040c8c0",
            "72dcd49d52c647729257f3c459aa29e7",
            "56d0b8036c4044aa9d57dea2bfe58eb8",
            "ea676a8941e74397b653fa6b9d56e777",
            "12573b73f5b74df5a9ec0913e3d1a2d5",
            "6a729356a8d3491387f03a65f99002c6",
            "d319964060cb4ac0994b663676126c14",
            "1524f853d99049b3af3d37239fa048df",
            "7a8c8a3cd6cd44c9b9aa3e0fba78d915",
            "25d3ae9d969d4cbea1f5b1ceb7e8f76e",
            "dfc9c95a46154ed8b0413473e2605354"
          ]
        },
        "id": "oDHhicggLHW7",
        "outputId": "6d31408e-c0b9-41cd-8c22-a7a43595f813"
      },
      "outputs": [],
      "source": [
        "from transformers import RobertaTokenizer\n",
        "\n",
        "tokenizer = RobertaTokenizer.from_pretrained(\"roberta-base\")\n",
        "\n",
        "def encode_batch(batch):\n",
        "  \"\"\"Encodes a batch of input data using the model tokenizer.\"\"\"\n",
        "  return tokenizer(batch[\"text\"], max_length=80, truncation=True, padding=\"max_length\")\n",
        "\n",
        "# Encode the input data\n",
        "dataset = dataset.map(encode_batch, batched=True)\n",
        "# The transformers model expects the target class column to be named \"labels\"\n",
        "dataset = dataset.rename_column(original_column_name=\"label\", new_column_name=\"labels\")\n",
        "# Transform to pytorch tensors and only output the required columns\n",
        "dataset.set_format(type=\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Bwl-pv89LJfa"
      },
      "source": [
        "Now we're ready to train our model..."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XgykXOBHLLFK"
      },
      "source": [
        "## Training\n",
        "\n",
        "We use a pre-trained RoBERTa model checkpoint from the Hugging Face Hub. We load it with [`AutoAdapterModel`](https://docs.adapterhub.ml/classes/models/auto.html), a class unique to `adapters`. In addition to regular _Transformers_ classes, this class comes with all sorts of adapter-specific functionality, allowing flexible management and configuration of multiple adapters and prediction heads. [Learn more](https://docs.adapterhub.ml/prediction_heads.html#adaptermodel-classes)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "lENbN034LMOk",
        "outputId": "e3c7f52b-8c95-4d36-f02f-4707a623706c"
      },
      "outputs": [],
      "source": [
        "from transformers import RobertaConfig\n",
        "from adapters import AutoAdapterModel\n",
        "\n",
        "config = RobertaConfig.from_pretrained(\n",
        "    \"roberta-base\",\n",
        "    num_labels=2,\n",
        ")\n",
        "model = AutoAdapterModel.from_pretrained(\n",
        "    \"roberta-base\",\n",
        "    config=config,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QY4Xiy9iLOVO"
      },
      "source": [
        "**Here comes the important part!**\n",
        "\n",
        "We add a new adapter to our model by calling `add_adapter()`. We pass a name (`\"rotten_tomatoes\"`) and an adapter configuration. `\"seq_bn\"` denotes a [sequential bottleneck adapter](https://docs.adapterhub.ml/methods.html#bottleneck-adapters) configuration.\n",
        "_Adapters_ supports a diverse range of different adapter configurations. For example, `config=\"lora\"` can be passed for training a [LoRA](https://docs.adapterhub.ml/methods.html#lora) adapter, `config=\"prefix_tuning\"` for [prefix tuning](https://docs.adapterhub.ml/methods.html#prefix-tuning) or `config=\"loreft\"` for [LoReFT](https://docs.adapterhub.ml/methods.html#reft). You can find all currently supported configs [here](https://docs.adapterhub.ml/methods.html).\n",
        "\n",
        "Next, we add a binary classification head. It's convenient to give the prediction head the same name as the adapter. This allows us to activate both together in the next step. The `train_adapter()` method does two things:\n",
        "\n",
        "1. It freezes all weights of the pre-trained model, so only the adapter weights are updated during training.\n",
        "2. It activates the adapter and the prediction head such that both are used in every forward pass."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "id": "saGtTGogLPVf"
      },
      "outputs": [],
      "source": [
        "# Add a new adapter\n",
        "model.add_adapter(\"rotten_tomatoes\", config=\"seq_bn\")\n",
        "# Alternatively, e.g.:\n",
        "# model.add_adapter(\"rotten_tomatoes\", config=\"lora\")\n",
        "\n",
        "# Add a matching classification head\n",
        "model.add_classification_head(\n",
        "    \"rotten_tomatoes\",\n",
        "    num_labels=2,\n",
        "    id2label={ 0: \"👎\", 1: \"👍\"}\n",
        "  )\n",
        "\n",
        "# Activate the adapter\n",
        "model.train_adapter(\"rotten_tomatoes\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zyLopENcLSEq"
      },
      "source": [
        "For training an adapter, we make use of the `AdapterTrainer` class built-in into _Adapters_. This class is largely identical to _Transformer_'s `Trainer`, with some helpful tweaks e.g. for checkpointing only adapter weights.\n",
        "\n",
        "We configure the training process using a `TrainingArguments` object and define a method that will calculate the evaluation accuracy in the end. We pass both, together with the training and validation split of our dataset, to the trainer instance.\n",
        "\n",
        "**Note the differences in hyperparameters compared to full fine-tuning.** Adapter training usually requires a few more training epochs than full fine-tuning."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "id": "QoTEQbhQLTGB"
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "from transformers import TrainingArguments, EvalPrediction\n",
        "from adapters import AdapterTrainer\n",
        "\n",
        "training_args = TrainingArguments(\n",
        "    learning_rate=1e-4,\n",
        "    num_train_epochs=6,\n",
        "    per_device_train_batch_size=32,\n",
        "    per_device_eval_batch_size=32,\n",
        "    logging_steps=200,\n",
        "    output_dir=\"./training_output\",\n",
        "    overwrite_output_dir=True,\n",
        "    # The next line is important to ensure the dataset labels are properly passed to the model\n",
        "    remove_unused_columns=False,\n",
        ")\n",
        "\n",
        "def compute_accuracy(p: EvalPrediction):\n",
        "  preds = np.argmax(p.predictions, axis=1)\n",
        "  return {\"acc\": (preds == p.label_ids).mean()}\n",
        "\n",
        "trainer = AdapterTrainer(\n",
        "    model=model,\n",
        "    args=training_args,\n",
        "    train_dataset=dataset[\"train\"],\n",
        "    eval_dataset=dataset[\"validation\"],\n",
        "    compute_metrics=compute_accuracy,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LtSsZNJsLVk4"
      },
      "source": [
        "Start the training 🚀"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 416
        },
        "id": "isGxmG9LLnhs",
        "outputId": "511f8d82-ef8e-47c4-8dcd-a754f4a65cfe"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "2da09ed5df2d4357b8b7029ca599c5e2",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "  0%|          | 0/1602 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{'loss': 0.5131, 'learning_rate': 8.75156054931336e-05, 'epoch': 0.75}\n",
            "{'loss': 0.3153, 'learning_rate': 7.503121098626716e-05, 'epoch': 1.5}\n",
            "{'loss': 0.2828, 'learning_rate': 6.254681647940075e-05, 'epoch': 2.25}\n",
            "{'loss': 0.271, 'learning_rate': 5.006242197253434e-05, 'epoch': 3.0}\n",
            "{'loss': 0.25, 'learning_rate': 3.7578027465667915e-05, 'epoch': 3.75}\n",
            "{'loss': 0.2428, 'learning_rate': 2.50936329588015e-05, 'epoch': 4.49}\n",
            "{'loss': 0.2469, 'learning_rate': 1.2609238451935081e-05, 'epoch': 5.24}\n",
            "{'loss': 0.2216, 'learning_rate': 1.2484394506866418e-07, 'epoch': 5.99}\n",
            "{'train_runtime': 305.6866, 'train_samples_per_second': 167.426, 'train_steps_per_second': 5.241, 'train_loss': 0.2928388835525096, 'epoch': 6.0}\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "TrainOutput(global_step=1602, training_loss=0.2928388835525096, metrics={'train_runtime': 305.6866, 'train_samples_per_second': 167.426, 'train_steps_per_second': 5.241, 'train_loss': 0.2928388835525096, 'epoch': 6.0})"
            ]
          },
          "execution_count": 7,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "trainer.train()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0BLPys_0LuI5"
      },
      "source": [
        "Looks good! Let's evaluate our adapter on the validation split of the dataset to see how well it learned:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 141
        },
        "id": "hnct-N6dLvOd",
        "outputId": "a0b951c7-16ac-4958-8b6d-df8298157f2d"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "1d8d71a940884516978427285a4927fc",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "  0%|          | 0/34 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/plain": [
              "{'eval_loss': 0.2768324911594391,\n",
              " 'eval_acc': 0.8902439024390244,\n",
              " 'eval_runtime': 3.1694,\n",
              " 'eval_samples_per_second': 336.339,\n",
              " 'eval_steps_per_second': 10.728,\n",
              " 'epoch': 6.0}"
            ]
          },
          "execution_count": 8,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "trainer.evaluate()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XTtjRB_VNJbL"
      },
      "source": [
        "We can put our trained model into a _Transformers_ pipeline to be able to make new predictions conveniently:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "x-BYe1jeNK98",
        "outputId": "1df24b91-cc26-4511-b496-10bae08b610f"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "The model 'RobertaAdapterModel' is not supported for . Supported models are ['AlbertForSequenceClassification', 'BartForSequenceClassification', 'BertForSequenceClassification', 'BigBirdForSequenceClassification', 'BigBirdPegasusForSequenceClassification', 'BioGptForSequenceClassification', 'BloomForSequenceClassification', 'CamembertForSequenceClassification', 'CanineForSequenceClassification', 'LlamaForSequenceClassification', 'ConvBertForSequenceClassification', 'CTRLForSequenceClassification', 'Data2VecTextForSequenceClassification', 'DebertaForSequenceClassification', 'DebertaV2ForSequenceClassification', 'DistilBertForSequenceClassification', 'ElectraForSequenceClassification', 'ErnieForSequenceClassification', 'ErnieMForSequenceClassification', 'EsmForSequenceClassification', 'FalconForSequenceClassification', 'FlaubertForSequenceClassification', 'FNetForSequenceClassification', 'FunnelForSequenceClassification', 'GPT2ForSequenceClassification', 'GPT2ForSequenceClassification', 'GPTBigCodeForSequenceClassification', 'GPTNeoForSequenceClassification', 'GPTNeoXForSequenceClassification', 'GPTJForSequenceClassification', 'IBertForSequenceClassification', 'LayoutLMForSequenceClassification', 'LayoutLMv2ForSequenceClassification', 'LayoutLMv3ForSequenceClassification', 'LEDForSequenceClassification', 'LiltForSequenceClassification', 'LlamaForSequenceClassification', 'LongformerForSequenceClassification', 'LukeForSequenceClassification', 'MarkupLMForSequenceClassification', 'MBartForSequenceClassification', 'MegaForSequenceClassification', 'MegatronBertForSequenceClassification', 'MistralForSequenceClassification', 'MobileBertForSequenceClassification', 'MPNetForSequenceClassification', 'MptForSequenceClassification', 'MraForSequenceClassification', 'MT5ForSequenceClassification', 'MvpForSequenceClassification', 'NezhaForSequenceClassification', 'NystromformerForSequenceClassification', 'OpenLlamaForSequenceClassification', 'OpenAIGPTForSequenceClassification', 'OPTForSequenceClassification', 'PerceiverForSequenceClassification', 'PersimmonForSequenceClassification', 'PLBartForSequenceClassification', 'QDQBertForSequenceClassification', 'ReformerForSequenceClassification', 'RemBertForSequenceClassification', 'RobertaForSequenceClassification', 'RobertaPreLayerNormForSequenceClassification', 'RoCBertForSequenceClassification', 'RoFormerForSequenceClassification', 'SqueezeBertForSequenceClassification', 'T5ForSequenceClassification', 'TapasForSequenceClassification', 'TransfoXLForSequenceClassification', 'UMT5ForSequenceClassification', 'XLMForSequenceClassification', 'XLMRobertaForSequenceClassification', 'XLMRobertaXLForSequenceClassification', 'XLNetForSequenceClassification', 'XmodForSequenceClassification', 'YosoForSequenceClassification'].\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "[{'label': '👍', 'score': 0.98734450340271}]"
            ]
          },
          "execution_count": 9,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from transformers import TextClassificationPipeline\n",
        "\n",
        "classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, device=training_args.device.index)\n",
        "\n",
        "classifier(\"This is awesome!\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ANVQrhIsNPOu"
      },
      "source": [
        "At last, we can also extract the adapter from our model and separately save it for later reuse. Note the size difference compared to a full model!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "50FhEtBrNP_C",
        "outputId": "f50f64da-cf6e-4456-de35-d0ceaebbd677"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "Der Befehl \"ls\" ist entweder falsch geschrieben oder\n",
            "konnte nicht gefunden werden.\n"
          ]
        }
      ],
      "source": [
        "model.save_adapter(\"./final_adapter\", \"rotten_tomatoes\")\n",
        "\n",
        "!ls -lh final_adapter"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5sAJKaQeNREg"
      },
      "source": [
        "**Share your work!**\n",
        "\n",
        "The final step after successful training is to share our adapter with the world!\n",
        "_Adapters_ seamlessly integrates with the [Hugging Face Model Hub](https://huggingface.co/models), so you can publish your trained adapter with a single method call:\n",
        "\n",
        "**Important:** Make sure you're properly authenticated with your Hugging Face account before running this method. You can log in by running `huggingface-cli login` on your terminal."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "model.push_adapter_to_hub(\n",
        "    \"my-awesome-adapter\",\n",
        "    \"rotten_tomatoes\",\n",
        "    datasets_tag=\"rotten_tomatoes\"\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This will create a repository _my-awesome-adapter_ under your username, generate a default adapter card as README.md and upload the adapter named `rotten_tomatoes` together with the adapter card to the new repository. [Learn more](https://docs.adapterhub.ml/huggingface_hub.html)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "➡️ Continue with [the next Colab notebook](https://colab.research.google.com/github/Adapter-Hub/adapters/blob/main/notebooks/02_Adapter_Inference.ipynb) to learn how to use adapters from the Hub."
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.10"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
